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Method and Apparatus for Transcoding Multimedia Using 

Content Analysis 



Field of the Invention 

The present invention relates to the delivery of multimedia content and more specifically 
to a method and apparatus for transcoding multimedia content by analyzing the content and by 
transcoding the content on the basis of the results of the analysis in order to adapt the content to 
constraints in delivery, display, processing, and storage. 

Background of the Invention 

A growing diversity of client devices are gaining access to networked servers that 

distribute rich multimedia content. However, the capabilities of the devices to access, process and 

display the content varies widely. While color workstations, which have high bandwidth network 

connections, can readily access and display large colorful images, many hand-held computers 

(HHCs), personal digital assistants (PDAs), screen phones, and smart phones can only display 

small images and cannot handle video. Television-based web browsers are constrained by the 

low-resolution interlaced display of television screens. Personal computers having color monitors 

often achieve low data rates along dial-up network connections, thereby inhibiting access to rich 

content. Given the variety of client devices, it is difficuh for content publishers to anticipate and 

accommodate the wide spectrum of client cq)abilities. 
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options for content adaptation include developing multiple versions of multimedia 
content, each suitable for a different class of client devices. Manually generating multiple versions 
works well if the devices can be easily aggregated into a small number of classes. Alternatively, 
methods can be developed that automatically generate the multiple versions of the content, such 
5 as creating a foil-resolution version of the content which can be processed to generate lower 
resolution versions. The latter approach can be extended to allow content servers to 
automatically generate the appropriate version of the content at the time of request. The {>erver 
can manipulate, or transcode, the existing foil-resolution content, on-the-fly, to adapt it to 
..^ constraints in delivery and constraints in display, processing, and storage at the client devices. 

1&; The transcoding mechanism can be deployed in a number of ways in a networked system, 

including deployment at a server or at the client. Alternatively, the transcoding system can be 
deployed at a proxy which retrieves the content from the content server, manipulates it on-the-fly, 
and forwards the resuUs to the client device, as demonstrated by J. R. Smith, R, Mohan, and C.-S. 
Li, m an article entitled "Transcoding Internet content for heterogeneous cli«it devices", 

151 published in Proc, IEEE Inter. Symp. (hi Circuits and Syst (ISCAS), June, 1998. A prox;^ system 
can optionally cache different versions of the content to speed up the transcoded content delivery. 
Proxy-based transcoding systems have been developed for adapting images to client devices. Fox, 
et al., developed a system for compressing images that pass through the network proxy d€\dce, as 
detailed in "Adapting to network and client variability via on-demand dynamic distillation". 

20 pubKshed in ASPLOS-VIl Cambridge, MA, October, 1996. 

. Other systems that compress the images using a proxy implementation to speed-up image 
download time (see; e.g., Intel Quick Web. Http://www.intel.com/Quickweb and Spyglass Prism. 
Http://www.spvglass.com/products/prism . 
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There are many ways in which a transcoder can adapt content to the client device, such as 
by data compression, summarization and media conversion. Benefits can be realized by jjelecting 
the transcoding operations on the basis of the network conditions, publisher preferences, user 
preferences and the client device capabilities. Furthermore, additional benefits could be gained by 
selecting the transcoding operations on the basis of an analysis of the content as demonstrated by 
J. R, Smith, R. Mohan and C.-S, Li in an article entitled **Content-based transcoding of images in 
the Internet," published in Proc. of IEEE Inter, Conf. Oft Image Processing (ICIP-98), Chicago, 
IL, Oct. 1998, and in an article entitled "Multimedia content customization for universal access," 
published in Proc. of SPIE East - Multimedia Storage and Archiving Systems III^ Boston, MA, 
Nov. 1998. 

There are many dimensions by which the content could be analyzed in order to select the 
transcoding operations. For example, the content analysis can ideally examine any of the 
following: the visual, audio, or textual characteristics of the content, such as the color in£3rmation 
in images, the motion or scene information in video, spectral information in audio, or the 
occurrence of words in text passages; the purpose of the content in the larger context of a 
multimedia document, such as by identifying titles, headings, paragraphs, abstracts, 
advertisements, and inter-document links; or the importance or relevance of the content in the 
document or to the user, such as by identifying paragraphs related to search terrais, images related 
to query images, or multi-media objects related to specific semantic classes. 

On the basis of the content analysis, the transcoding system could then select diflferent 
transcoding operations for diflferent classes of content. For example, the transcoding system 
could selectively compress color and black-and-white images differently; could detect audio 
passages that have characteristics of speech, then convert the speech to text; could selectively 
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remove advertisement graphics and leave other images; or could selectively and lossily compress 
objects within a multimedia document based on their relevance to a semantic topic or to search 
terms in order to conserve bandwidth. By coupling the content analysis with transcoding, the 
content could be better adapted to constraints in delivery, display, processing and storage. 

It is, therefore, an objective of the present invention to provide a system and method for 
analyzing multimedia content prior to transcoding same for delivery. 

It is another objective of the invention to selectively transcode multimedia content based 
on content analysis. 

Summary of The Invention 

In accordance with the aforementioned and other objectives, the present invention is 
directed towards an apparatus and method for transcoding multimedia data on the basis of content 
analysis. Many possible transcoding operations can be performed on multimedia data to adapt it 
to constraints in delivery and display, processing and storage of client devices. The selection of 
specific transcoding operations can be made by first analyzing the features, purposes and 
relevances of the individual multimedia objects within the multimedia documents, then by 
selecting the transcoding alternatives according to the results of the analysis. 

Brief Description of the Drawmgs 

The invention will hereinafter be described in greater detail with specific reference to the 
appended drawings wherein: 

Fig 1 shows a transcoding system that adapts multimedia content to the capabilities of 
client devices; 
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Fig 2 shows a transcoding process by which multimedia content is broken down into 
individual multimedia objects and modalities that are analyzed and transcoded separately; 

Fig 3 shows the organization of muhiple representations of multimedia objects into a 
pyramidal data structure; 

Fig 4 shows the content selection process for transcoding a multimedia document 
consisting of two multimedia objects; 

Fig 5 shows the association of content values scores with alternative representations of a 
fiill-resolution video; 

Fig 6 shows the association of content preference scores with alternative representations 
of a fuU-resolution video; 

Fig 7 shows the results of labeling images in a multimedia document into image type and 
purpose classes; 

Fig 8 shows a decision-tree for classifying images into image type classes; 

Fig 9 shows examples of transcodings of an image that modify the image along the 
dimensions of size, fidelity and color in order to adapt them to the client devices; 

Fig 10 shows the options for deploying a transcoder at a server, proxy or client in order to 
transcode multimedia documents in a networked environment; 

Fig 1 1 shows an image transcoding proxy that analyzes and compresses images, 
on-the-fly, m order to adapt them to the client devices; and 

Fig 12 shows the deployment of a video transcoding system in a digital video librsiry to 
provide imiversal access for client devices. 



Y0998393 



5 



Detailed Description of a Preferred Embodiment of the Invention 

Figure 1 depicts one example of a networked client-server system having features of the 
present invention. As depicted, one or more clients (100), proxies (104) and servers (1 1 1) are 
interconnected by a network (103). Examples of networks are local area networks (LANs) or 
vnde area networks (WANs), e.g., an intranet, the Internet, or the World-Wide Web (WV/W). 
A content adaptation process analyzes and transcodes content retrieved from a server (1 1 1) in 
order to adapt it the constraints of the client devices (100). The client device (100), ranniing a 
user-application (101), accesses the content at the server (111). The user-application can make 
use of a local cache (102) to store and serve previously retrieved content. The user-application 
makes a request for content by communicating the request through a network (103) to a jproxy 
(104), The objective of the proxy is to obtain the content and deliver it back to the 
user-application in a form that is suitable for the constraints of the client device (100), such as the 
network, display, processing and storage constraints. 

The client request is communicated to a content adaptation manager (105) at the proxy. 

The content adaptation manager manages the processing at the proxy in order to satisfy the 

client's request. The content adaptation manager can check the contents of a local cache (1 12) to 

determine if the needed content has been stored locally at the proxy. Potentially, diflferent 

previously transcoded versions of the content can be stored in the proxy cache. If the ne^^ded 

content is not stored in the cache, the content adaptation manager can issue a request to the 

content server (1 1 1) to retrieve the needed content. Once the content adaptation manager has 

obtained the content, it passes it to a transcoding system (106). According to the present 

invention, the transcoding system includes a content analysis subsystem (109), a content [selection 

subsystem (108), and a content transcoder subsystem (107). As will be apparent to one having 
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skill in the relevant art, the foregoing components are representative and may be combined or 
broken up into further components provided that the functionality remains. 

In accordance with the inventive method, the processes running in the transcoding system 
determine the mismatch between the delivery, display, processing and storage requirements of the 
content and the constraints of the delivery system and of the client device, and then adapt the 
content accordmgly. The content analysis subsystem (109) first analyzes the content. The content 
analysis can consist of many diflFerent operations including, but not limited to, classifying images 
into image type, purpose and semantic classes; extracting key-fi"ames out of video sequenc^es; 
extracting key-words out of text passages and speech transcripts; separating multimedia 
documents into muhimedia objects; and separating multimedia objects into constituent modalities. 

The content selection subsystem (108) selects the versions and components of the content 
to be transcoded (108), preferably by utilizing the results of the content analysis when makmg the 
selection. For example, the content selection process may select only the images that have been 
determined to be presentation content and not advertisements. The content selection pro<:ess can 
also optimize the overall value of the content to be delivered to the client within the constraints of 
delivery, display, processing and storage as taught by C.-S. Li, R. Mohan, and J. R. Smith in 
"Method for adapting multimedia content to client devices'' Y0R8- 1998-0647 . Once the 
selections have been made, the content transcoder subsystem (107) can perform the transcxxiing 
of the content. The transcoding subsystem can perform operations such as: compressing images, 
audio, video and text; removing fi-ames fi'om video, or temporal segments from audio; cojwerting 
text to speech; converting audio to text through speech recognition; converting text from one 
language to another, summarizing text passages; and so forth. 
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In order to perform the content selection and transcoding, the transcoding system can 
make use of the content analysis results. An optional policy engine (1 13) can employ transcoding 
rules that utilize the content analysis in order to perform the transcoding. For example, policies 
can be established to perform the following functions: compress photographs and graphics 

5 differently using the resuhs of image type classification; remove advertisement images from 

multimedia documents using the resuhs of image purpose detection; or preferentially transcode 
the text paragraphs related to particular semantic topics using the resuhs of text analysis. In each 
of these examples, the policy engine uses the resuhs of content analysis in order to select the 
appropriate transcoding operation and to select the appropriate content to be transcoded. Once 

1]| the transcoding is performed, the content is returned in the response stream to the client (100) 

m through the network. The client can optionally cache the returned content in the local cache 
(102). In addition, the transcoding enthy (the pro^qr of Figure 1) may optionally store the 

: transcoded version in anticipation of another client request from a client having the same 

Hi capabilities as the requesting client. 

lj| Referring to Figure 2, there is shovm a flow diagram which is suitable for implementing 

the multimedia content adaptation process (103). Upon receipt of a request, the process starts by 
retrieving the muhimedia content from storage (200) or from a server site. The content is then 
separated into individual multimedia objects in step (201). The separation process may involve 
analysis of the multimedia material to determine file formats, MIME types, and other information 

20 that influences the separation. This processing can separate out different it«ns in a Web 

document such as text bodies, Java applets, images, animations and embedded video. After 
multimedia object separation, the individual muhimedia objects can be analyzed (202) and 
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transcoded (203) independently, can be analyzed and grouped for transcoding, or can be lurther 
broken down into individual modalities (205). 

The multimedia object analysis step (202) analyzes the multimedia objects and passes the 
resuhs onto the multimedia object transcoder step (203). The transcoded multimedia objects are 
then synthesized together in step (204) to generate the transcoded multimedia content, to many 
cases, the synthesis can be done asynchronously, such as in the asynchronous loading of Web 
pages. In other cases, when synchronization needs to be maintained, such as for a video and its 
audio track, the transcoding process may need to preserve or construct the necessary 
synchronization information. 

Alternatively, each multimedia object can be further separated into modal elements wWch 
can be performed before (not shown) or after (see: step (205)) analyzing the multimedia objects. 
Each individual modality, such as the text, image, video and audio of each multimedia object can 
be analyzed separately in step (206). The modality analysis subsystem can deploy specialized 
analysis algorithms for each modality. For example, photograph analysis algorithms can be 
utilized for visual content, and speech analysis can be utilized for audio content. The results of the 
analysis can then be passed onto the modality transcoding step (207) which transcodes each 
modality of the multimedia object. The transcoding can convert the input modality to a new 
modality, such as text to audio, or audio to text. Altanatively, the transcoding can summarize, 
compress, or elaborate on the content within the ^ven modality of the input data, such as by 
image con^ression, or text summarization. Once the modal elements are transcoded, thiey can be 
synthesized together in step (208) to generate the transcoded multimedia objects. The transcoded 
multimedia objects can then be synthesized together in step (204) to generate the output 
transcoded multimedia content. 
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Referring to Figure 3, a data structure is shown in which the multiple representations of a 
multimedia object can be organized into a pyramidal structure. The cells of the pyramid 
correspond to different representations of the objects using different modalities such as video 
(300), image (301), audio (302) and text (303) and fidelities such as in the range of full-resolution 
(bottom) to low-resolution (top), A specific modal element of a multunedia object can be referred 
to by one of the cells. 

The transcodmg can be performed on the modal element by following the transcoding 
paths in Figure 3 (examples are 304, 305, 306, 307, 308, 309). By following the horizontal paths 
(examples are 304 and 305), a modal element can be translated to a new modality. For example, 
text can be converted to audio in path (304). Similarly, video can be converted to images in path 
(305). By following the vertical paths (306, 307, 308, 309), a modal element can undergo a 
change in fidelity. For example, text passages can be summarized along path (308), video can be 
compressed along path (305), images can be compressed along path (306) and to a greater degree 
along path (309). 

Referring to Figure 4, there is shown an example of transcoding of a mukimedia 
document (400) consisting of two multimedia objects (402 and 403) using the multiple modality 
and fidelity transcoding approach. The document is transcoded in step (406) by selecting a new 
modality and fidelity for each of the modal elements of the multimedia objects. As shown, object 
(402) in the original document (400) is replaced with object (404) in the transcoded document, 
where object (404) is represented in a particular modality (/) and fidelity (/). The modality (/) can 
refer to the text modality and the fidelity (J) can refer to a level of 50% summarization of the text. 
Likewise, object (403) in the original document (400) can be replaced with object (405) in the 
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transcoded document, where object (405) is represented in a particular modality {k) and fidelity 
(/). 

Referring to Figure 5, there is shown an example of associating content value scores with 
individual modalities (video (500), image (501), text (502) and audio (503)) and fidelities of the 
multimedia objects. The content value scores, which can reflect the amount of information 
contained within the objects, can be assigned subjectively by content authors. The content value 
scores can be embedded in the content or stored along with it, in which case it can be 
communicated from the server upon request. Alternatively, the content value scores can be 
computed using functions such as those that depend on the entropy or some other measure of 
information. Given an input multimedia object such as a video (500), content value scores can be 
assigned to the possible representations of the video. For example, the original full resolution 
video (505) may be assigned with the highest content value score of "1" (505). However, the 
conversion of the video to an image, text (504), or audio may resuft in a reduction of the content 
value score. For example, when rendered as text, the content value score may be reduced to "4" 
(504). Likewise, summarization or compression (507) of the video, or summarization or 
compression of any of the alternative representations of the video (506, 508) using different 
modalities may resuk in a reduction of the content value score. For example, when comi)ress>ed 
one level, the content value score may be reduced to "2" (507). When the image-form of the 
content is compressed one level, the content value score may be reduced to "3" (506). VVhen 
forther compressed, the content value score may be reduced to "4" in path (509). Similarly, 
aimmarization of the text-form of the content (504) by one-level in path (508) may redu^^je the 
content value score to "5". 
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Referring to Figure 6, there is shown an example of associating content preference scores 
with the individual modalities (video (600), image (601), text (602) and audio (603)) and fidelities 
of the multimedia objects. The content preference scores can be assigned subjectively by users, 
authors or publishers of the content. Alternatively, the content preference scores can be derived 
from the attributes of the client devices. The transcoding system can optimize the transcoding of 
content by using the content value and content preference scores as taught by C,-S. Li, R. Mohan, 
and J. R. Smith in "Method for adapting multimedia content to client devices" Y0R8- 1998-0647 . 
For example, a transcoding algorithm can maximize the total content value given the constraints 
of the client devices, as detailed m the aforementioned Smith, et al article. Alternatively, a 
transcoding algorithm can maximize the total content preference given the constraints of tltie client 
devices. 

The constraints of the client devices may eliminate some content alternatives. For 
example, a hand-held computer that cannot display video can have content value preferenc^es that 
eliminate video as indicated by "X" for video in (605, 607). The device may prefer to have video 
delivered in the form of text and assign a high preference value of "2" to text (604). If the screen 
is small, the device may prefer to have the text summarized one level by assigning a higher 
preference of "1" to one-level lower fidelity of text (608). The device may be able to handle 
some images and indicate a preference level of "3'' (606) for receiving video in the form of 
compressed images. Content preference is communiated by the client device in its initial request. 

Referring to Figure 7, there are shovm the resuhs of analyzing a multimedia document 
(700). In the analysis process, the document (700) can be separated into objects such as photos 
(714), graphics (713, 715, 717) and text (716). Each of the objects can be analyzed sepai^ately 
(202) as illustrated in Figure 2, or can be broken down into further constituent modal elements, 
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which are analyzed separately (206). The analysis process can also be designed to determine 
which objects are related to each other. For example, by correlating the semantic information for 
each object, the analysis process may determine that an image such as (717) is related to a text 
passage such as (716), For example, in the case of text information, this correlation can be 
performed by computing the similarities of term histograms for each of the objects. Once objects 
are determined to be related, it is possible to then transcode them as a group. For example, 
objects (716) and (717) can be transcoded together (i.e., removed together, compressed together) 
as a group. Likewise, the individual modal elements of the muUimedia objects can be analyzed 
and transcoded as a group as illustrated in Figure 2, 

In general, many different content analysis and transcoding mechanisms are possible for 
multimedia documents. In particular, the benefits of using content analysis in order to perform 
image transcoding can be realized for many documents published on the World-Wide Web. 
Figure 7 shows the results of an image analysis system that classifies the image content in 
multimedia documents on the Web image type (701) and purpose (702) classes. The following 
are examples of image type classes: T = {BWG, BWP, GRG, GRP, SCG, CCG, and CP}, where 

• BWG -b/w graphic 

• BWP " b/w photo 

• GRG - gray graphic 

• GRP - gray photo 

• SCG - simple color graphic 

• CCG - complex color graphic 

• CP " color photo 
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The graphics vs. photographs categorization distinguishes between synthetic and natxiral 
images. In many cases, the distinction between photographs and graphics is not clear for images 
on the Web as detailed by V. Athitsos, M, J. Swain, and C. Frankel in an article entitled 
"Distinguishing photographs and graphics on the World-Wide Web" from the Proc. IEEE 
Workshop on Content-based Access of Image and Video Libraries, June, 1997. The following 
are examples of image purpose classes P = { ADV, DEC, BUL, RUL, MAP, INF, NAV, CON), 
where: 

• ADV - advertisement, i.e., banner ads 

• DEC - decoration, i.e., background textures 

• BUL bullets, points, balls, dots 

• RUL - rules, lines, separators 

• MAP — maps, i.e., images with click focus 

• INF - information, i.e., icons, logos, mastheads 

• NAV — navigation, i.e., arrows 

• CON — content related, i.e., news photos 

The unage type analysis can assign each image in document (700) to an image type class. 
For example, image (717) is determined to be a complex color graph (CCG) (712). Image (713) 
is determined to be a simple color graphic (SCG) (704). Image (715) is also determined to be a 
simple color graphic (SCG) (705), Image (714) is determined to be a color photo (CP) (706). 

The fanage purpose analysis can assign each image in document (700) to an image purix)se 
class based on embedded information or analysis. For example, image (717) is determined to be a 
content image (CON) (709). Image (713) is determined to be a navigational (NAV) (707). 
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Image (715) is also determined to be an advertisement (ADV) (71 1). Image (714) is determined 
to be a content image (CON) (710), 

The images can also be assigned to subject classes. Example subject classes include S = 
{sports, weather, entertainment, news, art, architecture, music, and so forth} using the related 
text-to-subject mappings shown by J. R. Smith and S.-F. Chang in "Visually searching the Web 
for content", IEEE Multimedia Mag,, 4(3): 12 - 20, July - September 1997. The semantic 
information can then be used in the content selection process. For example, the selection jprocess 
can select only images related to "football." 

Referring to Figure 8, there is shown a decision-tree for classifying images (812) into 
image type classes (805, 806, 807, 808, 809, 810, 81 1) . The decision tree classifies the images 
along the dimensions of color content (color (813), gray (815), b/w (816)), and source 
(photographs, graphics). An example of each of the seven image type classes is illustrated at the 
bottom of Figure 8 (805=BWG, 806=BWP, 807=GRG, 808=GRP, 809-SCG, 810=CCa 
81 1=CP). The image type decision tree can use five tests (800, 801, 802, 803, 804), each of 
which utilizes a set of features extracted fi-om the images. The features can be extracted only iis 
needed for the tests in order to minimize processing. The image features can be derived from 
several color and texture measures computed fi-om the images. 

Each image ^^h^i] has three color components, corresponding to the RGB color 

channels as follows: ^^i^ " ifr^Xg^Xh)^ where ^''?'^f>'^^ ^ {0^155} The decision tree 
performs the following tests for each image ^: 
Color vs. non-color. 



Y0998393 



15 



The first test (800) distinguishes between color (813) and non-color (814) images using 



the measure of the mean saturation per pixel f^- . The saturation channel of the image is 

computed fi^om ^ fi-om " max(Xr>Xg,Xi») - niin(Xr,Xjf,X6) jjjen, 

f^s-W^ ^m,nys[f^lyf^] gives the mean saturation, where are the image width and height, 

respectively. Table 1 shows the mean and standard deviation of the saturation 

measure for the set 

of 1,282 

images. The mean saturation discriminates well between color 
and non-color images since the presence of color requires ^ ^, while strictly non-color images 
have f^s-Q However, due to noise, a small number of saturated colors often appear in 
non-color images. For example, for the non-color images, ^(P^ ==2.0 



Test 1 


# 






Non-color 


464 


2 


5.6 


Color 


818 


63 


46.2 



❖ Table 1 . The color vs. non-color test uses mean saturation per pixel 
BAV vs. Gray. 

The second test (801) distinguidies between b/w (816) and gray (815) images using the 
entropy -^^^ and variance of the intensity channel . The intensity channel of the image is 
computed as from - ^-^^ + 0.6Xg + O.lXj, xhen, the intensity emropy is given by 

i^. = -l£Jpl^lc^2P[^l,where'^ ^ "^^"l 0 olker^L^e j. 
The intensity variance is given by ^v'^m (Xvlm, n] - ft^)^^ ^gre 

i^v = "STT Table 2 shows the statistics of -^v and for non-color images. 



Y0998393 



16 



For b/w images the expected entropy jg low and expected variance is high. The reverse is 
true for gray images. 



Test 2 


# 










BAV 


300 


1.4 


1.1 


11,644 


4,993 


Gray 


164 


4.8 


2.1 


4,196 


2,256 



❖ Table 2. The b/w vs. gray test uses intensity entropy ^ ^ and variance . 
BWGvs. BWP. 

The third test (804) distinguishes between b/w graphics (805) and b/w photos (806) using 
the minimum of the mean number of intensity switches in horizontal and vertical scans of the 
miage. The mean number of intensity switches in the horizontal du-ection is defined by 

[ o .ermse j vertical switches .^^w are defined similaily fi-om 

the transposed image Then, the intensity switch measure is given by = »^nCaL,^L). 



Test 3 


# 






BWG 


90 


0.09 


0.07 


BWP 


210 


0.47 


0.14 



❖ Tables. The BWG vs. BWP test uses intensity switches 



GRGVS.GRP. 



The fourth test (803) distinguishes between gray graphics (807) and gray photos (808) 



using the intensity switch measure ^'^ and intensity entropy ^y. Table 3 shows the mean ^QK) 
and standard deviation ^i^^) of the intensity switch measure for 300 164 g^ay images. 
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The switch measure distinguishes well between b/w graphics and photos since it typically has a 
much lower value for b/w graphics. The gray graphics are found to have a lower switch measure 
and lower entropy than the gray photos. 



Test 4 


# 


E{W,) 




m) 




GRG 


80 


0.4 


0.26 


3.3 


1.8 


GRP 


84 


0.81 


0.16 


0.16 


1.4 



❖ Table 4. The GRG vs. GRP uses and intensity entropy 



SCGvs. CCG vs. CP. 

The fifth test (802) distinguishes between simple color graphics (809), complex color 
graphics (810) and color photos (811). The images are transformed to HSV and vector 
quantized, as described in [7]. The process generates a 166-HSV color representation of the 
image y^^, where each pixel refers to an index in the HSV color look-up table. 



Tests 


# 














SCG 


492 


69.7 


50.8 


2.1 


0.8 


0.24 


0.16 


CCG 


116 


71.2 


46.2 


3.1 


1 


0.36 


0.16 


CP 


210 


42.5 


23.5 


3.3 


0.7 


0.38 


0.15 



❖ Table 5. The SCG vs. CCG vs. CP test uses mean saturation HSV entropy ^i^^ and 



HSV switches *^'i'56. 

The test uses the 166-HSV color entropy -^l^^ and mean color switch per pixel ^^'5^ 
measures. In the computation of the 166-HSV color entropy, gives the fi'equency of pixels 
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with color index value The color switch measure is defined as in the test three measure, except 

that it is extracted fi-om the 166-HSV color image The test also uses the measure of me^m 

saturation per pixel Table 5 shows the statistics for and ^1^*5 for 818 color 

images. Color graphics have a higher expected saturation than color photos. But, color 

photos and complex color graphics have higher expected entropies ^(^If^) and switch measures 

ft( V/m) in the quantized HSV color space. 

Web documents often contain information related to each image that can be used to inl?er 

information about them, as detailed in the following; N. C. Rowe and B. Frew, "Finding 

photograph captions multimodally on the World Wide Web*' from the Technical report Code 

CS/Rp, Dept. Of Computer Science, Naval Postgraduate School, 1997, and I R. Smith and S.-F. 

Chang, "Visually searching the Web for content", from the IEEE Multimedia Mag,, 4(3): 12 - 20, 

July • September, 1997 . An image purpose classification system can use this information in 

concert with the image type information to classify the images into image purpose classes. The 

system can make use of five contexts for the images in the Web documents: C = {BAK, H^, 

ISM, REF, LIN}, defined in terms of HTML code as follows: 

• BAK " background, i.e., <body backgr=...> 

• INL - inline, i.e., <img src= ..> 

• ISM - ismap, i.e., <img src=... ismap> 

• REF — referenced, i.e., <a href=...> 

• LIN - linked, i.e., <a hre^...ximg src=^...x;/a> 

The system can also use a dictionary of terms extracted from the text related to the 
images. The terms are extracted from the "alt" tag text, the image URL address strings, and the 
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text nearby the images in the Web documents. The system can make use of terms such as D = { 
"ad", "texture", "bullet", "map", "logo", "icon"}. The system can also extracts a number of 
image attributes, such as image width ( height ( ), and aspect ratio " "^'^y 

The system can classify the images into the purpose classes using a rule-based decision 
tree framework described by S. Paek and J. R. Smith in "Detecting image purpose in Worid-Wide 
Web documents", from the Symp. On Electronic Imaging: Science and Technology. — 
Document Recognition, San Jose, CA, January 1998. The rules map the values for image type 
^ ^ context ^ tmns ^ *^ and image attributes ^ ^ {^'^' ^ into the purpose classes The 

following examples illustrate some examples of the image purpose rules: 

• p - ADV ^ t=SCG, c-REF, d= "ad" 

• p = DEC ^ c=B AK, d= "texture" 

• p = MAP ^ t=SCG, c=ISM, w>256, h>256 

• p = BUL ^ • t=SCG, r>0.9, r<l . 1, w<12 

• p = RUL ^ t=SCG, r>20, h<12 

• p = INF ^ t=SCG, c=INL, h<96, w<96 

In order to provide feedback about the embedded images for text browsers, the system 

can generate image summary mformation. The summary information contains the assignecl image 
type and purpose, the Web document context, and related text. The system can use an inKige 
subject classification system that maps images into subjects categories ( ^ using key-terms; (^) 
i.e., ^ which is described in the aforementioned Rowe, et al article. The summary 
information can be made available to the transcoding engine to allow the substitution of the muige 
with text. 
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The system can transcode the images using a set of transcoding policies. The policies 
apply the transcoding functions that are appropriate for the constraints in delivery and display, 
processing and storage of the client devices. 

Referring to Figure 9, the transcoding system can provide a set of transcoding functions 
that mMipuIate the images along the dimensions of image size, fidelity, and color, and that 
substitute the images with text or HTML code. For one, the transcoding can reduce the simount 
of data needed to represent the images and speed up download times. The transcoding cam also 
reduce the size of the images in order to fit the images onto the client display screens. The 
transcoder can also change the storage format of the image in order to gain compatibility mth the 
client device image handling methods. Some example transcoding functions inchide 

• Size: size reduction, crop, and subsample. For example the full-resolution 256 x 256 
image (900) can be spatially reduced to generate a smaller 192 x 192 image (901), 

• Fidelity: JPEG compress, GIF compress, quantize, reduce resolution, enhance edges, 
contrast stretch, histogram equalize, gamma correct, smooth, sharpen, and de-noij;e. For 
example the full-resolution image (900) can be compressed in addition to being spatially 
reduced (901) to further reduce the amount of data to 23KB. 

• Color content: reduce color, map to color table, convert to gray, convert to b/w, 
threshold, and dither. For example, the 24 bit RGB color image (901) can undergo color 
reduction to generate an 8-bit RGB color image with only 256 colors (902), The image 
(902) can undergo fiirther color reduction to generate a 4-bit gray image with only 16 
levels of gray (903). The image can undergo even fiirther color reduction to generate a 
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1-bit BAV image (904), The color reduction can further involve dithering to optindze the 
photograph quality in BAV. 
• Substitution: substitute attributes ( ^ ), text ( ^ ), type ( ^ ), purpose ( ?^ ), and subject ( ^ 
X and remove image. For example, the image (900) can be replaced with the term 
"bridge" (905). 

Table 6 illustrates some of the variability in device bandwidth, display size, display color 
and storage among devices such as workstations (906), color personal computers (PCs) (907), 
TV-based Web browsers (908), hand-held computers (HHCs) (909), personal digital assistants 
(PDAs) (910) and smart phones (91 1). 



Client device 


Bandwidth (bps) 


Display size 


Display color 


Device storage 


PDA (910) 


14.4K 


320x200 


b/w 


1MB 


Smart phone 
(911) 


14.4K 


80x1 


b/w 


lOOK 


HHC (909) 


28.8K 


640 x 480 


gray 


4M13 


TV browser 
(908) 


56K 


544x384 


NTSC 


1GB 


Color PC (907) 


56K 


1024 x 768 


RGB 


2-4 GB 


Workstation 
(906) 


lOM 


1280x1024 


RGB 


>4GB 



❖ Table 6. Summary of client device capabilities. 
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Since many devices are constrained in their capabilities, they cannot simply access image 
content as-is on the Internet, For example, many PDAs (910) cannot handle JPEG images, 
regardless of size and can display only BAV images (904). The HHCs (909) cannot easily display 
Web pages loaded with images because of screen size limitations. Color PCs (907) often cannot 
access image content quickly over dial-up connections. The presence of fully saturated red or 
white images causes distortion on TV-based Web browser (902) displays. Some smart phones 
(911) cannot display any images but can display a small amount of text that can be delivered in 
place of the image. In other devices such as speech-based browsers in automotive vehichjs, the 
text information can be rendered as speech information which can be played as audio. Other 
constraints of the devices such as the nature of the network connectivity can be consider. For 
example, devices such as hand-held computers (HHCs), personal digital assistants (PDAs), and 
smart phones that use wireless links may suffer from intermittent connectivity. In these cases, the 
transcoder can consider adding redundancy to the data to protect against data loss. 

In general, the transcoder framework allows the content providers to publish content at 
the highest fidelity, with the system manipulating the content to adapt to the unique charaicteristics 
of the devices. The transcoding system can employ the transcoding fixnctions in the transcoding 
policies. Consider the following example transcoding policies based upon image type and client 

device capabilities: 

• minify(X)^type(X)=CP,device=HHC (909) 

• subsample(X) ^* type(X)=SCG, device=HHC (909) 

♦ dither(X)*-type(X)=CP,device=PDA (910) 

♦ threshold(X)^type(X)=SCG, device-PDA (910) 
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• JPEGKX) *~ type(X)=GRP, bandwidth ^ 28.8K 

• GIF(X)^type(X)=GRG, bandwidth ^28.8K 

Notice that two methods of image size reduction are employed: minify and subsample. 
The difference is that minify performs anti-aliasing filtering and subsampling. Minifying gi-aphics 
often generates false colors during filtering and increases the size of the file, which can be avoided 
by subsampling directly. For compression, JPEG works well for gray photographs but not for 
graphics. For GIF, the reverse is true. When converting color images to b/w, dithering tlie 
photographs improves their appearance, while simply thresholding the graphics improves theii- 
readability. By performing the image type content analysis, the system is able to better select the 
appropriate transcoding functions. 

The transcoding policies can also make use of the image purpose analysis. Consider the 

following example transcoding policies: 

• fiillsize(X) ^ purpose(X)=MAP 

• remove(X) purpose(X)=ADV, bandwidth ^ 14.4K 

• substitute(X,' ' <li>" } *~ purposepC)=BUL, device=PDA 

• substitute(X,t) ^ purpose(X)=INF, display size 

The first policy makes sure that map images are not reduced in size in order to preserve llie click 

focus translation. The second policy illustrates the removal of advertisement images if the 
bandwidth is low. The third policy substitutes the bullet images with the HTML code "<li>," 
which draws a bullet without requiring the image. A similar policy substitutes rule images with 
"<hr>". The last policy substitutes the information images, i.e., logos, icons, mastiieads, with 
related text if the device screen is small. 
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Referring to Figure 10, transcoding proxies (1004, 1005, 1006) can be deployed on the 
side of the server, in the network (1002), or on the side of the client. Deployed in the network 
(1005), the transcoding proxy handles the requests from the client devices (1003) for multimedia 
dooiments and images. The proxy retrieves the documents and images from the servers (1001), 
analyzes, manipulates and transcodes them, and delivers them to the devices (1003). Deployed at 
the side of the server, the transcoding proxy (1004) may have direct access to the content at the 
server (1001) and can transcode the content and send it to the clients (1003) through the network 
(1002). Deployed on the side of the cHents, the transcoding proxy (1006) can perform 
transcoding on information retrieved by the chents (1003) from the servers (1001). 

Referring to Figure 1 1, there is shovra the reduction of the data by a transcoding proxy 
(1 101). Reducing the data sizes of the images at the transcoding proxy (1 101) ma image 
compression, size and color reduction can result in faster end-to-end deUvery, even when 
accounting for the latencies introduced by the content analysis and transcoding. The tramicoding 
proxy (1 101) can be designed to have a relatively high bandwidth between the proxy and the 
content server (1 100). In many cases, the proxy has a relatively low bandwidth to the client 
(1002). 

In the transcoding projQ' system, gives the proxy-to-server bandwidth, gives the 

client-to-proxy bandwidth, and gives the transcoder bandwidth. The terms ^« and denote 
the data sizes of original (1 103) and transcoded (1 104) images, respectively. The latency in 
retrieving the image directly to tiie client is ^ven by ~ DslBc latency in retrieving rfie 
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image via the transcoding proxy is given by * DJB^ DJBt -f DflBc xhe transcoder 
results in a net speed-up by a factor ^^^^^ ^ ^ if the data compression ratio A is 

Given a relatively high proxy-to-server bandwidth of ~ Bwbps, a client-to-proxy 
bandwidth of ~20 Kbps, and a transcoder bandwidth of ~ Kbps, a data compression 
ratio at the proxy of^^^^^ - ^'^^ results in a net end-to-end speed-up. If the data is 
compressed by a factor oi^i^^f ~ § the speed-up is by a factor of '^<^^f ^ ^-^ If 
Kbps, the data compression ratio needs to be increased to ^s^^^ 2: 1.8 to have a speed-up in 
delivery. In this case, data compression of^^^^'t - 8 speeds up delivery by a factor of 

Referring to Figure 12, there is shown a video transcoding system that can be dejjloyed as 
transcoding proxies (1201) in the Internet (1200), or can be deployed in the interface (1203) to a 
digital video library (1207). The transcoder can be used to provide universal access of tihe digital 
video library content (1205, 1206) to the client devices (1202). In many cases, the users that 
patronize the digital video library conduct searches of the digital video library using a video search 
and retrieval engine (1204). The search and retrieval engine (1204) can return browse data 
(1206) such as thumbnail images or animations, or video data (1205) to the user. If the user is 
accessing the video library via a constrained device or network connection, the video library can 
utilize a transcoder (1203) to adapt the content to the device constraints. For example, when the 
bandwidth is limited, the transcoder subsystem (1203) can transcode the content by allocating 
more bits of information to the itans that are returned highest in the search results lists, as 
determined by the video search and retrieval engine (1204). Alternatively, the transcoding proxies 
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(1201) in the network (1200) can transcode the video data (1205) and browse data (1206) to 
adapt it to the client devices. 

The invention has been described with reference to preferred embodiments. It will be 
apparent that one having skill in the art could make modifications without departing from the 
spirit and scope of the invention as set forth in the appended claims. 
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Claims 

1 . A computerized method for transcoding a multimedia presentation for delivery and display 
comprising the steps of; 

analyzing the content of the multimedia presentation; and 
performing transcoding based on said analy2mg, 

2. The method of Claim 1 wherein said performing transcoding comprises the steps of 
selecting at least one transcoding alternative based on the results of smd analyzing; and 
transcoding the content according to said at least one transcoding alternative. 

3. The method of Claim 1 wherein said performing transcoding comprises the steps of 
selecting less than all of said content for transcoding based on said analysing; and 
transcoding less than all of said content. 

4. The method of Claim 1 wherein said analyzing comprises the steps of 
separating a multimedia document into individual multimedia objects; and 
analyzing each multimedia object individually. 

5. The method of Claim 4 fiirther comprising the steps of 
separating the multimedia objects into individual modal elements; and 
analyzing each modal element of each multimedia object independently. 
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6. The method of Claim 4 fiirther comprising the steps of: 

identifying relationships between individual multunedia objects within a muUimedia 
document; and 

transcoding related multimedia objects as a group. 

7. The method of Claim 5 further comprising the steps of: 

identifying relationships between individual modal elements of multimedia objects ; and 
transcodmg the related modal elements as a group. 

8. The method of Claim 1, wherein the multimedia content is a document published on the 
World-Wide Web. 

9. The method of Claim 1, wherein the content analysis is performed off-line and the results 
stored embedded in or along with the multunedia content. 

9. The method of Claim 1, wherein the multunedia content comprises visual content. 

10. The method of Claim 9, wherein the content analysis classifies the visual content into at 
least one of image type, purpose and semantic classes. 

11. The method of Clahn 10, wherein the content analysis utilizes a decision-tree for 
classifying images into image type classes. 
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12. The method of Claim 1 1 wherein the image type classes comprise color photos, color 
graphics, gray photos, gray graphics, black and white photos, and black and white graphics. 

13. The method of Claim 12, wherein the content analysis procedure extracts color and 
texture features from the images. 

14. The method of Claim 13, wherein im^e type clasafication is used to select from different 
methods for compression, size reduction, color reduction, substitution, and removal. 

15. The method of Claim 13, wherein image purpose classification is used to select from 
different methods for compression, size reduction, color reduction, substitution and removal. 

16. The method of Qmm 1, wheran the transcoder adapts the content to the display, 
processing and storage constraints of the client devices. 

17. The method of Claim l, wherein the transcoder adapts the content to the bandwidth and 
connectivity constraints of the network. 

18. The method of Claim 16, wherein the client device is a speech browser in an automotive 
vehicle. 
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19, The method of Claim 1 6 wherein the client device is a hand-held computer, 

20, The method of Claim 16 wherein the client device is a smart phone, 

2 1 , The method of Claim 1 7, wherein the network connection uses a wireless link to the client 
device. 

22, The method of Claim 21, wherein the client and network provides intermittent 
connectivity between the transcoder and client device, 

23, A method as in claim 1, wherein the transcoding operation manipulates the data to 
generate an alternative version of it. 

24, A method as in claim I, wherein the transcoding operation selects an alternative version of 
data. 

25, A system for providing transcoding of the content of a multimedia presentation 
comprising: 

a content analysis component for analyzing the content of the multimedia presentation; 
and 

at least one transcoding component for performing transcoding of the content based on 
the analyzing. 
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26. The system of Claim 25 fiirther comprising a content selection component connected to 
received input from the content analysis component and to sdect at least one transcoding option 
based on the input; and to instruct said at least one transcoding component to perform the at least 
one transcoding option. 

27. A program storage device readable by machine, tangibly embodying a program of 
instructions executable by the machine to perform method steps for transcoding a multimtjdia 
presentation for delivery and display, said method comprising the steps of: 

analyzing the content of the multimedia presentation; and 
performing transcoding based on said analyzing, 

28 . The program storage device of Claim 27 wherein said performing transcoding comprises 
the steps of 

selecting at least one transcoding alternative based on the results of said analyzing; awl 
transcoding the content according to said at least one transcoding alternative. 
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Method and Apparatus for Transcoding Multimedia Using 

Content Analysis 



Abstract of the Invention 



A method and apparatus for transcQding multimedia data on the basis of content milym. 
Many possible transcoding operations can be perfonned on multimedia data to adapt it to 
constraints in deUvety and display, processing and storage of client devices. The selection of 
specific transcoding operations can be nmde by first analyzing the features^ purposes and 
relevances of the individual multimedia objects mtUn the multimedia documents, then by 
selecting the transcoding alternatives according to the results of the analysis. Based on the 
analysis, different transcoding algorithms can be applied to different content, less than all of the 
content can be transcoded, groups of multimedia objects can be transcoded, etc. 
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Figure 1: The transcoding system adapts the multimedia content to the ca- 
pabilities of the client devices by analyzing the content, selecting from the 
content and transcoding alternatives, and transcoding the content accord- 
ingly. 
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Figure 2: The multimedia content can be separated into individual multime- 
dia objects. Each of the multimedia objects can be separated into constituent 
modalities. Content analysis and transcoding can then be performed on the 
individual multimedia objects or modalities independently. 




Figure 3: The multiple representations of a multimedia object can be or- 
ganized into a pyramidal structure. The cells correspond to the different 
representations of the object using particular modalities and fidelities. The 
arrows indicate examples of transcoding paths that perform summarization 
(vertical arrows) and translation (horizontal arrows). 




Figure 4: Example of content selection for a multimedia document consisting 
of two multimedia objects A and where Aij is an object with modality i 
and fidelity j, and Bki is an object with modality k and fidelity L 




Figure 5: Content value scores indicate the relative content values of the 
alternative versions of a full-resolution video (high score = 1, low score = 7). 




Figure 6: Content preference scores indicate the relative preference of 
alternative versions of a full-resolution video. 
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Figure 7: Example labeling of the image type and purpose classes of the 
images in a multimedia document. 
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Figure 8: Image type decision tree consisting of five decision points for clas- 
sifying the images into image type classes. 
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Figure 9: Image transcoding modifies the images along the dimensions of 
size, fidelity and color in order to adapt them to the client devices. 
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Figure 10: Deployment of traascoding proxies (TP) at the server in the net- 
work and at the cUent for network-based transcoding of multimedia content. 
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Figure 11: An image transcoding proxy analyzes, manipulates and transcodes 
images, on-the-fly, to adapt them to the capabilities of the client devices. 
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Figure 12: A video transcoding system can be deployed in the interface to a 
digital video library in order to provide universal access to client devices* 



